5 research outputs found

    Multi-view representation learning for natural language processing applications

    Get PDF
    The pervasion of machine learning in a vast number of applications has given rise to an increasing demand for the effective processing of complex, diverse and variable datasets. One representative case of data diversity can be found in multi-view datasets, which contain input originating from more than one source or having multiple aspects or facets. Examples include, but are not restricted to, multimodal datasets, where data may consist of audio, image and/or text. The nature of multi-view datasets calls for special treatment in terms of representation. A subsequent fundamental problem is that of combining information from potentially incoherent sources; a problem commonly referred to as view fusion. Quite often, the heuristic solution of early fusion is applied to this problem: aggregating representations from different views using a simple function (concatenation, summation or mean pooling). However, early fusion can cause overfitting in the case of small training samples and also, it may result in specific statistical properties of each view being lost in the learning process. Representation learning, the set of ideas and algorithms devised to learn meaningful representations for machine learning problems, has recently grown to a vibrant research field, that encompasses multiple view setups. A plethora of multi-view representation learning methods has been proposed in the literature, with a large portion of them being based on the idea of maximising the correlation between available views. Commonly, such techniques are evaluated on synthetic datasets or strictly defined benchmark setups; a role that, within Natural Language Processing, is often assumed by the multimodal sentiment analysis problem. This thesis argues that more complex downstream applications could benefit from such representations and describes a multi-view contemplation of a range of tasks, from static, two-view, unimodal to dynamic, three-view, trimodal applications.setting out to explore the limits of the seeming applicability of multi-view representation learning More specifically, we experiment with document summarisation, framing it as a multi-view problem where documents and summaries are considered two separate, textual views. Moreover, we present a multi-view inference algorithm for the bimodal problem of image captioning. Delving more into multimodal setups, we develop a set of multi-view models for applications pertaining to videos, including tagging and text generation tasks. Finally, we introduce narration generation, a new text generation task from movie videos, that requires inference on the storyline level and temporal context-based reasoning. The main argument of the thesis is that, due to their performance, multi-view representation learning tools warrant serious consideration by the researchers and practitioners of the Natural Language Processing community. Exploring the limits of multi-view representations, we investigate their fitness for Natural Language Processing tasks and show that they are able to hold information required for complex problems, while being a good alternative to the early fusion paradigm

    Canonical Correlation Inference for Mapping Abstract Scenes to Text

    Get PDF
    We describe a technique for structured prediction, based on canonical correlation analysis. Our learning algorithm finds two projections for the input and the output spaces that aim at projecting a given input and its correct output into points close to each other. We demonstrate our technique on a language-vision problem, namely the problem of giving a textual description to an "abstract scene".Comment: 10 pages, accepted to AAAI 201

    The SUMMA Platform Prototype

    Get PDF
    We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams

    Relation extraction between entities from the "TA NEA" newspaper archive using non-supervised techinques

    No full text
    113 σ.Η παρούσα διπλωματική εργασία έχει ως αντικείμενο τη μελέτη και την ανάπτυξη ενός συστήματος εξαγωγής σχέσεων μεταξύ οντοτήτων από αδόμητο, ποικίλης θεματολογίας και δομής κείμενο, με χρήση τεχνικών μη επιβλεπόμενης μάθησης. Το σύστημα ακολουθεί το πρότυπο του open relation extraction, δηλαδή δεν απαιτεί καμία πληροφορία εισόδου πέρα από το σώμα κειμένου από το οποίο επιχειρεί να εξάγει σχέσεις. Η εξαγωγή σχέσεων μεταξύ οντοτήτων συνίσταται στην συστηματική εξαγωγή τριάδων της μορφής (e1 , r, e2), όπου e1, e2 οντότητες και r η (ρηματική) σχέση με την οποία συνδέονται. Το σύστημα αντιμετωπίζει κείμενα τα οποία είναι γραμμένα στην ελληνική γλώσσα. Για την υλοποίηση και τον έλεγχο ορθής λειτουργίας του χρησιμοποιήθηκε το αρχείο της εφημερίδας «ΤΑ ΝΕΑ» · μια επιλογή η οποία εξασφάλισε ένα μεγάλου μεγέθους και ποικίλης θεματολογίας και μορφής σώμα κειμένου. Η εξαγωγή σχέσεων επιτυγχάνεται με τη χρήση τεχνικών συντακτικής ανάλυσης κειμένου και ο διαχωρισμός τους σε θετικές (σημασιολογικά ορθές) ή μη γίνεται με τη χρήση ταξινομητή. Ο ταξινομητής εκπαιδεύεται με ένα σύνολο επισημειωμένων δεδομένων, τα οποία προκύπτουν από την εφαρμογή ενός συνόλου κανόνων.The main object of the present thesis is the study and development of a system that attempts to extract relations between entities from large, unstructured and multiple-topic corpora, using non-supervised learning techniques. The system follows the open relation extraction paradigm; hence it does not require additional input data, except the text corpus. Relation extraction is oriented towards the extraction of tuples (e1, r, e2), where e1, e2 denote entities and r denotes the (verbal) relation that connects the two entities. The system addresses texts written in greek language. The corpus used as test set was the archive of the greek newspaper “TA NEA”, which offered a multiple topic and multiple structure amount of text as input data. The system first extracts a large number of relations from the input text using parsing techniques and then each relation gets classified as positive (semantically true) or negative by a classifier. The classifier is trained by a training set of data tagged by the system, using a set of rules.Νικόλαος Π. ΠαπασαραντόπουλοςΓεώργιος Γ. Θεοφίλο
    corecore